6.2 Constraint
61
More formally, the relative entropy (Kullback–Leibler divergence) 16 between two
(discrete) distributions with probability functions a Subscript kak and b Subscript kbk is
script upper R left parenthesis a comma b right parenthesis equals sigma summation Underscript k Endscripts a Subscript k Baseline log Subscript 2 Baseline left parenthesis a Subscript k Baseline divided by b Subscript k Baseline right parenthesis periodR(a, b) =
Σ
k
ak log2(ak/bk) .
(6.19)
Ifa Subscript kak is an actual distribution of observations, andb Subscript kbk is a model description approxi-
mating to the data, 17 thenscript upper R left parenthesis a comma b right parenthesisR(a, b) is the expected difference (expressed as the number
of bits) between encoding samples froma Subscript kak using a code based onaa and using a code
based on bb. This can be seen by writing Eq. (6.19) as
script upper R left parenthesis a comma b right parenthesis equals minus sigma summation Underscript k Endscripts b Subscript k Baseline log Subscript 2 Baseline a Subscript k Baseline plus sigma summation Underscript k Endscripts a Subscript k Baseline log Subscript 2 Baseline a Subscript k Baseline commaR(a, b) = −
Σ
k
bk log2 ak +
Σ
k
ak log2 ak ,
(6.20)
where the first term on the right-hand side is called the cross-entropy of a Subscript kak and b Subscript kbk,
the expected number of bits required to encode observations from aa when using a
code based on bb rather than aa. Conversely, script upper R left parenthesis a comma b right parenthesisR(a, b) is the gain in information if a
code based on aa rather than bb is used.
Suppose that upper P left brace x 1 comma x 2 comma ellipsis comma x Subscript m Baseline right braceP{x1, x2, . . . , xm} is the probability of having a certain pattern
(arrangement), or mm-gram x 1 comma x 2 comma ellipsis comma x Subscript m Baselinex1, x2, . . . , xm, 18 assumed to be ergodic (stationary
stochastic). 19 Examples could be the English texts studied by Shannon; of partic-
ular relevance to the topic of this book is the problem of predicting the nucleic acid
base following a known (sequenced) arrangement. The conditional probability 20 that
the pattern [left parenthesis m minus 1 right parenthesis(m −1)-gram] x 1 comma x 2 comma ellipsis comma x Subscript m minus 1 Baselinex1, x2, . . . , xm−1 is followed by the symbol x Subscript mxm is
upper P left brace x Subscript m Baseline vertical bar x 1 comma x 2 comma ellipsis comma x Subscript m minus 1 Baseline right brace equals StartFraction upper P left brace x 1 comma x 2 comma ellipsis comma x Subscript m minus 1 Baseline comma x Subscript m Baseline right brace Over upper P left brace x 1 comma x 2 comma ellipsis comma x Subscript m minus 1 Baseline right brace EndFraction periodP{xm|x1, x2, . . . , xm−1} = P{x1, x2, . . . , xm−1, xm}
P{x1, x2, . . . , xm−1}
.
(6.21)
The “mm-length approximation” to the entropyupper S Subscript mSm, defined as the average uncertainty
about the next symbol, is
Sm = −
Σ
x1,x2,...,xm−1
P{x1, x2, . . . , xm−1}
×
Σ
x
P{xm|x1, x2, . . . , xm−1} log P{xm|x1, x2, . . . , xm−1} .
(6.22)
16 Although sometimes called “distance”, since script upper R left parenthesis a comma b right parenthesis not equals script upper R left parenthesis b comma a right parenthesisR(a, b) /= R(b, a) it is not a true metric and is
therefore better called divergence rather than distance.
17 Possibly constructed a priori.
18 See also Sect. 13.1.
19 See Sect. 11.1.
20 See Sect. 9.2.2.